Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Online task scheduling algorithm for big data analytics based on cumulative running work

LI Yefei, XU Chao, XU Daoqiang, ZOU Yunfeng, ZHANG Xiaoda, QIAN Zhuzhong

Journal of Computer Applications 2019, 39 (8): 2431-2437. DOI: 10.11772/j.issn.1001-9081.2019010073

Abstract （390）

PDF （1056KB）（248）

Save

A Cumulative Running Work (CRW) based task scheduler CRWScheduler was proposed to effectively process tasks without any prior knowledge for big data analytics platform like Hadoop and Spark. The running job was moved from a low-weight queue to a high-weight one based on CRW. When resources were allocated to a job, both the queue of the job and the instantaneous resource utilization of the job were considered, significantly improving the overall system performance without prior knowledge. The prototype of CRWScheduler was implemented based on Apache Hadoop YARN. Experimental results on 28-node benchmark testing cluster show that CRWScheduler reduces average Job Flow Time (JFT) by 21% and decreases JFT of 95th percentile by up to 35% compared with YARN fair scheduler. Further improvements can be obtained when CRWScheduler cooperates with task-level schedulers.

Reference | Related Articles | Metrics

Select

Fast outlier detection algorithm based on local density

ZOU Yunfeng, ZHANG Xin, SONG Shiyuan, NI Weiwei

Journal of Computer Applications 2017, 37 (10): 2932-2937. DOI: 10.11772/j.issn.1001-9081.2017.10.2932

Abstract （502）

PDF （914KB）（447）

Save

Mining outliers is to find exceptional objects that deviate from the most rest of the data set. Outlier detection based on density has attracted lots of attention, but the density-based algorithm named Local Outlier Factor (LOF) is not suitable for the data set with abnormal distribution, and the algorithm named INFLuenced Outlierness (INFLO) solves this problem by analyzing both k nearest neighbors and reverse k nearest neighbors of each data point at cost of inferior efficiency. To solve this problem, a local density-based algorithm named Local Density Based Outlier detection (LDBO) was proposed, which can improve outlier detection efficiency and effectiveness simultaneously. LDBO introduced definitions of strong k nearest neighbors and weak k nearest neighbors to realize outlier relation analysis of those data points located nearby. Furthermore, to improve the outlier detection efficiency, prejudgement was applied to avoid unnecessary reverse k nearest neighbor analysis as far as possible. Theoretical analysis and experimental results Indicate that LDBO outperforms INFLO in efficiency, and it is effective and feasible.

Reference | Related Articles | Metrics